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Abstract 

Here, we give an algorithm for deciding if the nonnegative rank of a matrix M of dimension 
mxnis at most r which runs in time (nm)°^ r K This is the first exact algorithm that runs 
in time singly-exponential in r. This algorithm (and earlier algorithms) are built on methods 
for finding a solution to a system of polynomial inequalities (if one exists). Notably, the best 
algorithms for this task run in time exponential in the number of variables but polynomial in 
all of the other parameters (the number of inequalities and the maximum degree). 

Hence these algorithms motivate natural algebraic questions whose solution have immediate 
algorithmic implications: How many variables do we need to represent the decision problem, 
does M have nonnegative rank at most r? A naive formulation uses nr + mr variables and yields 
an algorithm that is exponential in n and m even for constant r. (Arora, Ge, Kannan, Moitra, 
STOC 2012) [f] recently reduced the number of variables to 2r 2 2 r , and here we exponentially 
reduce the number of variables to 2r 2 and this yields our main algorithm. In fact, the algorithm 
that we obtain is nearly-optimal (under the Exponential Time Hypothesis) since an algorithm 
that runs in time (nm)°^ would yield a subcxponcntial algorithm for 3-SAT [1]. 

Our main result is based on establishing a normal form for nonnegative matrix factorization 
- which in turn allows us to exploit algebraic dependence among a large collection of linear 
transformations with variable entries. Additionally, we also demonstrate that nonnegative rank 
cannot be certified by even a very large submatrix of M, and this property also follows from 
the intuition gained from viewing nonnegative rank through the lens of systems of polynomial 
inequalities. 
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1 Introduction 



1.1 Background 

The nonnegative rank of a matrix is a fundamental parameter that arises throughout algorithms 
and complexity and admits many equivalent formulations. In particular, given a nonnegative 1 
matrix M of dimension m x n, its nonnegative rank is the smallest r for which: 

• M can be written as the product of nonnegative matrices A and W which have dimension 
m x r and r x n respectively 

• M can be written as the sum of r nonnegative rank one matrices 

• there are r nonnegative vectors v±,V2, •■■v r (of length m) such that the nonnegative hull of 
{vi, V2, ...v r } contains all columns in M 

Throughout this paper, we will denote the nonnegative rank by rank + (M) and we will refer to 
a factorization M = AW where A and W are nonnegative and have dimension m x r and r x n 
respectively as a nonnegative matrix factorization of inner-dimension r. 

Some of the most compelling applications of nonnegative rank are in machine learning, statistics, 
combinatorics and communication complexity. In machine learning, the benefit of requiring a 
matrix factorization M = AW to be nonnegative is that this factorization can then be interpreted 
probabilistically. A representative application comes from the domain of topic modeling, where M is 
chosen to be a so-called "term- by-document matrix" : the entry in row i, column j is the frequency of 
occurrence of the i th word in the j th document. And computing a nonnegative matrix factorization 
of inner-dimension r is akin to finding a collection of r topics (which are each distributions on words) 
so that each document can be expressed as a convex combination of these r topics. Nonnegative 
matrix factorization has found applications throughout machine learning, from topic modeling to 
information retrieval to image segmentation and collaborative filtering. Even this is far from an 
exhaustive list. We note that of particular interest in these applications, are instances of this 
problem in which the target nonnegative rank r is small. 

In combinatorial optimization, one is often interested in expressing a polytope P as the projec- 
tion of a higher-dimensional polytope Q which (hopefully) has much fewer facets. The minimum 
number of facets needed is called the extension complexity of P and there is a rich body of literature 
on this subject. Yannakakis established a striking connection between extension complexity and 
nonnegative rank: Given the polytope P, one constructs the "slack matrix": the entry in row i, 
column j is how slack the i th vertex is against the j th constraint. Yannakakis proved that the 
nonnegative rank of the slack matrix is exactly equal to the extension complexity of P [18]. Fiorini 
et al [6] recently used this connection and results from quantum communication complexity to 
prove a remarkable lower bound, that the traveling salesman (TSP) polytope has no polynomial 
size extended formulation. 

In communication complexity, the famous Log Rank Conjecture of Lovasz and Saks [10] asks if 
the log of the rank of the communication matrix and the deterministic communication complexity 
are polynomially related. In fact, an equivalent formulation of this problem (that follows from [2]) 
is that the Log Rank Conjecture asks if the log of the rank and the log of the nonnegative rank of a 
Boolean matrix are polynomially related. Of crucial importance here is that the matrix in question 
be Boolean. For a general matrix, there is no non-trivial relationship since there are examples in 

1 We will refer to a matrix that is entry- wise nonnegative as a "nonnegative matrix". 
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which the rank is three and yet the nonnegative rank is £l(y/n) [7]. Also in complexity theory, Nisan 
used nonnegative rank to prove lower bounds for non-commutative models of computation [12]. 

We note that nonnegative matrix factorization has also been applied to problems in biology, 
economics and chemometrics to model all sorts of processes, ranging from stimulation in the visual 
cortex to the dynamics of marriage. In fact, a historical curiosity is that nonnegative rank was first 
introduced in chemometrics, under the name of self-modeling curve resolution. 

1.2 Systems of Polynomial Inequalities 

The focus of this paper is: 

Question. What is the complexity of computing the nonnegative rank? 

A priori it is not even clear that there is an algorithm that runs in any finite amount of 
time. But indeed, Cohen and Rothblum [5] observed that the decision question of whether or not 
rank + (M) < r can be equivalently formulated as a system of 0(mn) polynomial inequalities with 
mr + nr total variables variables: we can treat each entry in A and each entry in TV as a variable, 
and the constraint that this be a valid nonnegative matrix factorization is exactly that A and W 
be nonnegative and that M = AW. The latter is a set of mn degree two constraints. It is easy to 
see that this system of polynomial inequalities has a solution if and only if rank + (M) < r. 

Moreover, whether or not a system of polynomial inequalities has a solution is decidable. This 
is a quite non-trivial statement. The first algorithm is due to Tarski [16], and there have since 
been a long line of improvements to this decision procedure. The best known algorithm is due to 
Renegar [13] and the running time of finding a solution to a system of p polynomial inequalities 
with k variables and maximum degree D is roughly 



So (appealing to decision procedures for a system of polynomial inequalities) there is an algorithm 
for computing the nonnegative rank of a matrix that runs in a finite amount of time. Note that if 
the target nonnegative rank r is small (say, three), this algorithm still runs in time exponential in 
m and n. And the question of whether or not there is a faster algorithm (in particular, one which 
runs in polynomial time for any constant r) was still open. Vavasis proved that nonnegative rank is 
NP-hard to compute [17], but this only rules out an exact algorithm that runs in time polynomial 
in n, m and r (if P ^ NP). 

The crucial observation that the reader should keep in mind throughout this paper is that 
the main bottleneck in finding a solution to a system of polynomial inequalities is the number 
of variables. Renegar's algorithm [13] runs in time polynomial in the number of polynomials (p) 
and the maximum degree (D), but runs in time exponential in the number of variables (A;). In a 
technical sense, the number of variables plays an analogous role to the VC-dimension in learning 
theory. (This connection can be made explicit by drawing an analogy between the Milnor-Thom 
and Warren Bounds and the Sauer-Shelah Lemma). 

Cohen and Rothblum [5] give a reduction from nonnegative rank to finding a solution to a 
system of polynomial inequalities that has mr + nr variables and a natural goal is to try to use 
fewer variables in this reduction. Arora et al [l] 2 do exactly this and give a reduction to a system 
with only f(r) = 2r 2 2 r variables. This yields an exact algorithm for deciding if rank + (M) < r 
that runs in time 
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which is doubly exponential in r, but runs in polynomial time algorithm for any fixed r. Furthermore 
Arora et al [1] demonstrate that an exact algorithm for deciding if rank + (M) < r that runs in 
time (nm)°( r ) would yield a sub-exponential time algorithm for 3-SAT. In summary, there is an 
exact algorithm for deciding if rank + (M) < r that runs in polynomial time for any r = O(l), and 
any algorithm must depend (at least) exponentially on r. However, the algorithm in [1] runs in 
time doubly exponential in r, and perhaps we could still hope for an algorithm that runs in time 
singly-exponential in r. Here, we give such an algorithm and we do this by reducing the number of 
variables exponentially from 2r 2 2 r to 2r 2 . 

And perhaps the main message in this paper is that systems of polynomial inequalities with 
even just a small number of variables can be remarkably expressive! We believe that this theme 
may find other applications: Perhaps there are other problems for which one would like to design 
an algorithm based on solving some appropriately chosen system of polynomial inequalities. Then 
in this case, reducing the number of variables can drastically improve the running time of an 
algorithm. Indeed, maybe this complexity measure deserves to be studied in its own right: 

Meta Question. Given a decision problem, how many variables are needed to encode its answer 
as a system of polynomial inequalities? 

In particular, we want that the decision problem is a YES instance if and only if the corresponding 
system of polynomial inequalities has a solution. We note that this question probably makes the 
most sense and is the most promising in the context of geometric problems. (Indeed, nonnegative 
rank can be thought of in a purely geometric language and this is the view that will be most useful 
in our paper). 

1.3 Our Results 

We now state our main results: Let M be a m x n nonnegative matrix and let L denote the 
maximum bit complexity of any coefficient in M. We prove 

Theorem. There is a poly(n,m, L)(r4 r+1 mn) cr2 time algorithm for deciding if the nonnegative 
rank of M is at most r. Additionally, given 5 > (and if rank + (M) < r), the algorithm runs 
in time poly (n, m, L, log j)(r4 r+1 mn) cr returns factors A and W that are entry-wise close (within 
an additive 5) to A and W (respectively) that are a nonnegative matrix factorization of M of 
inner- dimension at most r. Furthermore the entries of A and W have rational coordinates with 
numerators and denominators bounded in bit length by 0(L(rA r+1 mn) cr +logi). 

This is the first algorithm that runs in singly-exponential time as a function of r, and in fact 
is an exponential improvement over the previously best known algorithm due to Arora et al [1]. 
Moreover, notice that the algorithm in [1] is faster than the one in [5] only if r = O(logn) whereas 
our algorithm is in fact faster for any r = o(n). Our algorithm is nearly optimal (under the 
Exponential Time Hypothesis), since an exact algorithm that runs in time (nm)°^ would yield a 
sub-exponential time algorithm for 3-SAT [1]. 

Our approach is based on two steps. First, we establish a "normal form" for nonnegative matrix 
factorization. We show that any nonnegative matrix factorization M = AW of inner-dimension r 
can be placed in a normal form (crucially, without changing the inner-dimension) so that a small 
subset of entries of A and W uniquely determine all of the remaining entries. More precisely, there 
are functions F and G (whose behavior only depends on an r x r submatrix of A and on an r x r 

2 We remark that the present author is the last author on the paper [1]. However, the proofs that we present here 
will be self-contained. 



3 



submatrix of W respectively) such that F maps each column of M to the corresponding column of 
W and G maps each row of M to the corresponding row of A. 

These functions F and G can be quite complicated when A or W do not have full column or row 
rank respectively. In the case that both A and W have full column and row rank, these functions 
are just linear transformations (see [1]). The difficulty is that when, say, A is rank deficient there 
are cases in which we need exponential (in r) many linear transformations T\,T2, ...Tq so that the 
output of F is always the output of one of these linear transformations applied to a column of M. 
This is precisely the reason that the previous algorithm [1] ran in time doubly exponential in r - 
the number of variables is dominated by the number of linear transformations that we need, and 
in some cases we really do need exponentially many linear transformations to define the function 
F. Our approach to circumvent this problem is to exploit algebraic dependence among these linear 
transformations. In particular, our normal form allows us to show that the entries in these linear 
transformations can be defined as (ratios of) polynomial functions of a much smaller number of 
shared variables. This is an immediate corollary of our normal form and a simple application 
of Cramer's Rule. Hence we can reduce the number of variables (in the system of polynomial 
inequalities) from exponential in r to quadratic in r. 

We also consider another basic question about the nonnegative rank of a matrix: 

Question. Can the nonnegative rank of a matrix M be certified by a small submatrix? 

Indeed - in the case of the rank - a matrix M has rank at least r if and only if there is an 
r x r submatrix of M that has rank r. This property plays a crucial role in many applications 
[8] and it is natural to wonder if the nonnegative rank admits any similar characterization. As 
another motivation, often we are only given a subset of the entries of the matrix M (for example, 
in the Netflix problem) and we would like to use these entries to infer properties about M. Yet, 
the nonnegative rank behaves quite differently than the rank in this regard. 

Theorem. For any r € N, there is a 3rn x 3rn nonnegative matrix which has nonnegative rank at 
least 4r and yet for any < n rows, the corresponding submatrix has nonnegative rank at most 3r. 

So even the submatrices consisting of a constant fraction of the rows in M do not determine the 
nonnegative rank of M even within a constant factor. This result, too, can be thought of in the 
language of systems of polynomial inequalities: The basic principle at play is that even though 
the nonnegative rank can be equivalently characterized by a system of polynomial inequalities with 
only 2r 2 variables, there are systems of polynomial inequalities that are together infeasible and 
yet any strict subset of the constraints is feasible. This is in stark contrast to the case of linear 
inequalities, for which, if the system is infeasible (and is in dimension d) there is a subset of just d 
linear inequalities that is infeasible (i.e. there is a size d obstruction) [11]. 

2 Computing the Nonnegative Rank 
2.1 Stability (A Normal Form) 

Throughout this paper, let M denote an entry-wise nonnegative matrix of dimension m x n. We 
will also let Mj denote the i th column of M and M- 7 denote the j th row. Given a subset U C [re], 
we will let Mjj denote the submatrix consisting of columns of M from the set U (and similarly M v 
is a submatrix of rows of M). 

Definition 2.1. rank + {M) is the smallest r such that M can be written as 

M = AW 
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where A and W are nonnegative and have dimension m x r and r x n respectively. 



Additionally, we will call M = AW a nonnegative matrix factorization of inner-dimension r. 
Definition 2.2. 



(aff(A) is the affine hull of columns in ^4). 

Note: Given A, there is a nonnegative matrix W such that M = AW if and only if each column 
Mi of M is contained in aff(A). 

Definition 2.3. Given A and a vector v G K m (recall A is dimension m x r), we will call a subset 
S of columns of A admissible if 



We will use this notion to place a stronger requirement on any nonnegative matrix factorization 
of M It will not be immediately clear, but as we will be able to add this requirement without loss 
of generality. 

Throughout this paper, we will make use of the lexicographic ordering on subsets of columns 
of A. The standard lexicographic ordering is often restricted to comparing to subsets of the same 
size, but here we will want an ordering on all subsets. We will simply impose that if |,S| < |T|, S 
is before T in the lexicographic ordering. 

Let M = AW be a nonnegative matrix factorization. 

Definition 2.4. For each column Mi, let Si be the lexicographically first admissible subset (of 
columns of ^4) for Mj . Similarly, for each row M 3 , let Tj be the lexicographically first admissible 
subset (of rows of W) for MK We call M = AW stable if: 

1. for each i, Wi is supported in Si 

2. and for each j, A 3 is supported in Tj. 

Next we show that a nonnegative matrix factorization of inner-dimension r can always be made 
stable (while preserving nonnegativity and the inner-dimension): 

Lemma 2.5. If M = AW is a nonnegative matrix factorization of inner- dimension r, then there 
is a A and W such that: 

1. M = AW, A and W are nonnegative and have inner- dimension r and 

2. M = AW is stable. 

Proof: The natural approach to prove this lemma is, if M = AW is not stable, update columns 
in W or rows in A. The only subtle point is that if we update A and W at the same time to 
A and W, we may not have M = AW. So the approach is to update only one of these two at a 
time, to preserve that M = AW or M = AW and then update the other. Suppose we update W 
to W first. Then for a row in M J , the lexicographically first subset of admissible rows (for M J ) is 
defined with respect to W and not W - i.e. a subset V of rows is admissible if M J G af f(W ). 

Throughout our updating process, we will make use of a potential function to ensure that this 
process terminates. To each row of A and to each column of W, we will associate a subset of [r] 
corresponding to the support of the vector. Whenever we update either a row of A or a column of 
W, the support will only ever move earlier according to the lexicographic ordering. 




v G aff(A s ). 
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So, now we can define our updating procedure. We start with M = AW, and each update 
phase will alternately be an ^4-updating phase or a TV-updating phase. In a TV-updating phase, for 
each column Mj let Si be the lexicographically first subset of columns of A that is admissible for 
Mj. If Si is lexicographically (strictly) earlier than the support of Wi, we find a vector W% that is 
nonnegative, and supported in Si and satisfies Mj = AWi. If not, we set Wi = Wi. In either case, 
we have that Mj = AWi and hence M = AW. At the end of this phase, we overwrite W with W . 

The j4-updating phase is defined analogously, and throughout this procedure we maintain the 
invariant that M = AW and A and W are nonnegative and have inner-dimension r. Note that the 
support of columns of W and rows of A are monotonically decreasing according to the lexicograph- 
ical ordering, and if either A or W are updated (any row of A or any column of W), one support 
must have strictly decreased according to the lexicographic ordering so this updating procedure 
terminates with M = AW that are nonnegative, have inner-dimension r and are also stable. ■ 

2.2 Few Entries Determine A and W 

Throughout this section, let M = AW be a stable nonnegative matrix factorization. 

The goal in this section is to demonstrate that (given M), only a few entries in A and W are 
needed to determine the remaining entries. This is only a property of stable factorizations, and is 
not guaranteed to hold for general factorizations. 

Let rank(A) = s and let U C \m\ be a set of s linearly independent rows in A. Furthermore, let 
S\, S2, ...Sp C [r] be the (full) list of sets of s linearly independent columns of A (in lexicographic 
order). Note that p < Q < 2 r . 

Definition 2.6. The ensemble of A (at U) is a list of linear transformations: Bi,B<i-..B v where 
for each i, Bi is an r x s matrix that is zero on all rows outside the set Si and restricted to rows 



Note that each submatrix (A^. ) _1 is indeed invertible: rank(A) = s and U is a set of s linearly 
independent rows so a set Si of columns of A is linearly independent if and only if these vectors 
restricted to U are also linearly independent. 

The main goal in this section is to show: 

Lemma 2.7. For each column Mi, among the set of vectors 



Wi is the unique vector with lexicographically minimal support among all nonnegative vectors in 
the set S. 

We will break this lemma up into two parts: 
Lemma 2.8. Wi is contained in the set S. 

Proof: Let Ri be the support of Wj. Then Ri must correspond to a linearly independent set of 
columns of A - otherwise we could find a nonnegative Wj whose support is a strict subset of Ri 
such that AWi = Mj, but this would violate the condition of stability. 

Because the sets of linearly independent columns of A are a matroid, there is a set <SV of s 
linearly independent columns of A for which Ri C Si>. Hence 



in 5, is (A u s )~i 




Bi'Mi = B v {AWi) u = B v A u Wi = v. 
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However, By is zero on rows outside the set Sy and restricting ByA 11 to rows and columns in Sy 
is the s x s identity matrix. Since the support of Wi is contained in SV, we have Wi = v. ■ 

We note a corollary of this lemma that will be useful later: 
Corollary 2.9. The support ofWi corresponds to a linearly independent set of columns in A. 

Next, we prove the second part needed for the main result in this section: 
Lemma 2.10. For each vector ByMf ' , AB^Mf = Mi. 

Proof: Let v = AB v Mf '. We prove this lemma in two parts: first we prove that v u = Mf and 
then we prove the full lemma from this. Since By is zero on rows outside the set Sy, we have 

AB v =A Sil Bf' =A Sil {A\,r 1 - 

Hence v u = A u Si {A u s ^y lM i = M i ■ 

Consider a j outside the set U. By the choice of U, the row A 3 can be expressed as a linear 
combination of rows in A in the set U: 

AJ = E a 3*' Ai ' 

j'eu 

Since AWi = Mi, we have M\ = A j Wi = V ; ... r <>,.,•. 1 A 1', = Ylj'eu a jj' M i' and hence: 

v j = A j Bi,Mf = ^ n,. / ,l''/i r .W/ 

feu 

= Y, a jj' vj ' = J2 a J\r M i'= M i 

j'eu feu 

■ 

Now we can prove the main lemma in this section: 

Proof: We have already shown (Lemma 2.8) that Wi occurs in the set S. Consider any other 
nonnegative vector ByMj 1 = v. We need to show that the support of v is lexicographically later 
than the support of Wi. 

First, we claim that if v ^ Wi then the support of Wi is not the same as the support of v. 
Suppose not - i.e. v ^ Wi and yet the support of v and of Wi are identical (let this set be R). 
Indeed R must correspond to a linearly independent set of columns of A (Corollary 2.9). Hence we 
cannot have A(v — Wi) = (using Lemma 2.10) with v — Wi ^ and support of v — Wi contained 
in R. 

So the support of Wi and v are not identical and one of these must be lexicographically earlier. 
Suppose (for contradiction) that the support of v is earlier. We know (Lemma 2.8) that the support 
of Wi is an admissible set of columns of A for Mj. This contradicts stability (because we could 
update Wi to v), and so we can conclude that the support of Wi is lexicographically earlier. ■ 

Let rank{W) = t and let V be a set of t linearly independent columns of W. Then we can 
define an ensemble Ci, C2, ...Cg for W at V analogously as we did for A. Similarly, we have q < 
and for all j, among the set 

7={MlC 1 ,MlC 2 ,...MlC q } 

A 3 is the vector with lexicographically minimal support among all nonnegative vectors in 7 (this 
follows from the above proof by interchanging the roles of A and W). 
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2.3 A Semi-Algebraic Set, Take 1 

Our goal is to encode the question of whether or not rank + (M) < r as a non-emptiness problem for 
a semi-algebraic set with a small number of variables. Our first attempt will be to choose the entries 
in Bi,I>2, ...Bp and C±, C2, ...Cq as the variables. Our first goal is to construct a set of polynomial 
constraints (using the variables) so that setting Bi,B2, ...B p and C±, C%, ...C q to the ensembles of a 
stable factorization M = AW is a valid solution. We then show (conversely) that any valid setting 
of the variables in fact yields a nonnegative matrix factorization with inner-dimension r. 
Suppose we are given the sets U and V, and the ensembles B\,B2, ...B p and C±, C2, ...C q . 

Definition 2.11. Let first(§) applied to a collection of vectors output the vector with lexico- 
graphically minimal support among all nonnegative vectors in S. 

This function can output FAIL if there is no nonnegative vector in S. 

Claim 2.12. Set: 

Wi <r- first({B 1 M?, B 2 My, ...B p Mf}) (1) 

A i ^ firstdAfyd, M V C 2 , ...M y C q }) (2) 

There is an explicit Boolean function P that determines if for all i and j: 1. Wi > 2. A 3 > 
and 3. A J Wi = Mf. Furthermore, P is a function of sign constraints on the polynomials: 

1. B V MV (for all 

2. M° v C r (for all j,f) and 

3. M> v Cj,B v M? - M{ (for all , j,f). 

This claim is immediate, but we include a description of the Boolean function P for completeness 

Proof: The Boolean function P will be an AND over subfunctions Pjj defined for each i and j: 
Pjj will compute the index i' and j' so that B^M^ and MyCji are lexicographically earliest among 
nonnegative vectors in the sets S = {B x Mf ,B 2 Mf ...B V M^} and T = {M y Ci, M y C 2 , ...M v C q } 
respectively. This can be computed from only the signs of entries in the vectors in these sets. 

Then will check that for this i! and j', that M v CyB v M^ = M{. If there is no nonnegative 
vector in either § or T, or there are two or more nonnegative vectors tied for lexicographically 
earliest support (among only nonnegative vectors) then Pj j will output FAIL. ■ 

Lemma 2.13. P will output PASS when and {Cj'}j> are chosen as the ensembles of a 

stable factorization M = AW . 

Proof: This follows immediately from Lemma 2.7. However, note that Lemma 2.7 establishes 
uniqueness (i.e. the vector with lexicographically earliest support among all nonnegative vectors 
is unique) and hence each Pjj will not prematurely output FAIL for these choices of {Bii}ii and 

{Cj>}j>. m 

Next, we prove the converse direction: 

Lemma 2.14. IfF outputs PASS, then A and W (as defined in 1 and 2) are a nonnegative matrix 
factorization of inner- dimension r. 
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Note that this factorization is not necessarily stable. 

Proof: We have that W% and A 3 are nonnegative (otherwise P would have output FAIL) and P 
explicitly checks that A^Wi = M\ and hence M = AW. Note that B^ and Cy are r x s and t x r 
dimensional, so M = AW does indeed have inner-dimension r. ■ 

Combining Lemma 2.13 and Lemma 2.14, we have 

Theorem 2.15. P outputs PASS for some choice of s,t,U,V,p and q and some setting of the 
variables B\, B2, ...B p and C\, C2, --C q if and only if rank + (M) < r. 

This leads to a natural approach for computing the nonnegative rank: 

1. Guess s = rank(A) , t — rank(W) (for some stable factorization M = AW) 

2. Guess U and V 

3. Guess p < P") and q < rA 

4. Define a semi-algebraic set where the entries of Bi,B2,-..B p and C\,C2,---C q are variables 
(using the Boolean function P) 

5. Run an algorithm for deciding if the semi-algebraic set is non-empty (e.g. [13]) 

The running-time of the best algorithms for deciding if a semi-algebraic set is non-empty run 
in time 

/ \Oik) 
( # polynomials x D 1 

where D is the maximum degree and k is the number of variables. This bound is close to (optimal) 
bounds on the number of sign configurations of a set of polynomials with maximum degree D and 
k variables. These bounds are due to a number of authors, but are often referred to as Milnor- 
Warren bounds. Indeed the main bottleneck in algorithms for determining non-emptiness for a 
semi-algebraic set is just the time needed to enumerate all of these sign configurations (and make 
an oracle call to the Boolean function for each one). 

In the approach above, there are r{p + q) + mnpq polynomials of degree at most two in the 
variables. r{p + q) constraints are due to nonnegativity and mnpq constraints are used to ensure 
that M = AW. However, the drawback of the above approach is that the number of variables is 
large. 

There are rsp + rtq variables, and indeed p and q can be exponential in r. For example, if we 
take the columns of A to be vertices of the cross-polytope (in r/2 dimensions), then we do in fact 
need exponentially many simplices (one corresponding to each linear transformation By) to cover 
the convex hull of the cross-polytope just by a facet-counting argument. 

Hence, the running time of the above algorithm will be doubly exponential in r. However, we 
will be able to reduce the number of variables in this semi-algebraic set to polynomial in r (and 
we emphasize that this is possible only for the semi-algebraic set we defined here, not for the semi- 
algebraic set define in Arora et al [1]). The definition of stability is somewhat delicate, but this is 
what allows us to get an exponential reduction in the number of variables. 

2.4 A Semi-Algebraic Set, Take 2 

Here we reduce the number of variables in the semi-algebraic set exponentially by exploiting 
algebraic dependence among the matrices in the ensembles. 

Consider the ensemble: B\, B2, ...B p where for each i, there is a linearly independent set Si of 
s columns of A and (B{) Si = (A^.) _1 . Recall Cramer's Rule: 
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Lemma 2.16 (Cramer). Let R be an s x s invertible matrix. Then (R l )\ = det(R_ l j) / 'det(R) 
where is the matrix R with the i th row and the j th column removed. 

Hence we can instead use a variable for each entry in A u and each entry in Wy. Then the sign of 
(^4g ; ) _1 M^ can be recovered as a Boolean function of signs of degree at most s 2 polynomials in 
the entries of A u . (Additionally, we can check whether or not the polynomial det(A^ f ) is non-zero 
to determine if SV is linearly independent). 
Similarly a constraint of the form 

can be written as a degree at most 2r 2 polynomial constraint in the entries of A u and Wy by 

T i 

clearing the denominators by det{Wy ) and det(Ag ). 

This new semi-algebraic set has rs + rt variables and has r{p + q) + (p + q) + mnpq polynomials 
of degree at most 2r 2 (where the additional polynomials are the denominators in Cramer's Rule). 

Note that Lemma 2.14 still implies that if P outputs PASS, rank + (M) < r and a nonnegative 
matrix factorization of inner-dimension r can be computed from the settings of the variables for 
the valid point in the semi-algebraic set. And Lemma 2.7 still implies that this semi-algebraic set is 
non-empty if rank + {M) < r (since moreover Lemma 2.5 implies that there is a stable factorization). 

We can now use this reduction - and known algorithms for solving systems of polynomial 
inequalities (as described in Section 1.2) to give a nearly optimal algorithm for deciding if M has 
nonnegative rank at most r. Additionally, if rank + {M) < r we can also compute the corresponding 
nonnegative factors A and W to within an additive 6 (at the expense of an extra factor log \ in 
the running time). In [13], Renegar gave the first algorithm for deciding if a system of polynomial 
inequalities has a solution that runs in time exponential in the number of variables. We note that 
in [14], Renegar extended this algorithm to also return a <5-approximate solution to an algebraic 
formulae, and this is the algorithm that we will use to actually compute the factors A and W . We 
also note that these algorithms only assume access to an oracle to the Boolean function P, and our 
function P is computable in polynomial time. 

Let L denote the maximum bit complexity of any coefficient in M. Then applying the algorithms 
in [13] and [14] with our reduction we obtain: 

Theorem. There is a poly(n,m, L)(rA r+l mn) cr2 time algorithm for deciding if the nonnegative 
rank of M is at most r. Additionally, given 5 > (and if rank + (M) < r), the algorithm runs 
in time poly(n,m, L, log |)(r4 r+1 mn) cr returns factors A and W that are entry-wise close (within 
an additive 5) to A and W (respectively) that are a nonnegative matrix factorization of M of 
inner- dimension at most r. Furthermore the entries of A and W have rational coordinates with 
numerators and denominators bounded in bit length by 0(L(rA r+1 mn) cr +log^). 

Alternatively, in the Blum-Shub-Smale (BSS) Model [4] one can instead use the algorithm in [13] 
to decide if rank + (M) < r and the running time of this algorithm is poly(n,m) + (rA r+1 mn) cr . 

We emphasize that the above algorithm is based on answering a purely algebraic question: How 
many variables are needed (in a system of polynomial inequalities) to encode the question does 
M have nonnegative rank at most r? We obtain an exponential improvement on the number of 
variables, over the results in [1], and this coupled with algorithms for computing a solution to a 
system of polynomial inequalities, has an immediate algorithmic implication. The algorithm we 
obtain here is in fact nearly optimal under the Exponential Time Hypothesis (ETH) of Impagliazzo 
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and Paturi [9], since Arora et al [1] showed that an algorithm that decides if rank + (M) < r in 
(nm)°( r ) time would imply a sub-exponential time algorithm for 3-SAT. It is somewhat surprising 
that an algorithm for computing the nonnegative rank can be designed based on reasoning about 
systems of polynomial inequalities, and no algorithm (under plausible complexity assumptions) can 
do much better. 

3 Fragile Instances of Nonnegative Rank 

An important property of the rank of a matrix is that if a given matrix M has rank r, there is 
an r x r submatrix of M that also has rank r. Hence, rank admits a small certificate that serves 
as proof that a matrix does indeed have rank at least r and this fact plays a crucial role in many 
applications. 

Here, we give highly fragile instances of nonnegative rank: We give a (nonnegative) matrix 
M of dimension n x n with ran k + (M) = 4r, yet for any submatrix N of at most ^ columns 
of M, rank + {M) < 3r. To put this result in context, consider a system of linear inequalities in 
d dimensions that is infeasible. A basic result in discrete geometry [11] is that there is a subset 
of at most d + 1 of the linear inequalities that is infeasible. In Section 2.4, we gave a system of 
polynomial inequalities in 2r 2 dimensions that has a solution if and only if rank + (M) < r. One 
might hope that this system is infeasible if and only if there is a small subset of the inequalities 
that alone is infeasible, and that this would yield a subset of (say) the columns of M that "proves" 
that rank + (M) > r. Yet this is not the case and systems of polynomial inequalities do not have the 
"Helly Property" [11] (indeed their individual constraints do not necessarily correspond to convex 
regions). 

To give fragile instances of nonnegative rank, we will make use of a series of reductions of 
Vavasis [17] and a particular gadget in Arora et al [1]. In fact, we make use of a crucial property of 
the reduction in [17] from nonnegative rank to the intermediate simplex problem - in a sense, that 
rows of M are mapped to points and columns of M are mapped to constraints when reducing to 
the intermediate simplex problem. We will only be interested in the intermediate simplex problem 
in two dimensions: 

Definition 3.1. An instance of the intermediate polygon problem is a polygon Pet 2 and a set 
S C P of \S\ = n points. The goal is to find a triangle T with S C T C P in which case, we call 
this a YES instance and otherwise we call it a NO instance. 

Our goal is to construct an explicit instance of this problem that is NO instance and yet 
restricting to any set S' C S of at most ^ points is a YES instance and we accomplish this latter 
task by noticing that a particular gadget used in [1] (with a slight modification) has exactly this 
property. We will then be able to use this instance of the intermediate simplex problem as a gadget 
to construct fragile instances of nonnegative rank. 

We will begin with some simple geometric lemmas and definitions. 

Definition 3.2. Let Cd = {(x, y)\x 2 +y 2 < d}, and we will write C for C\. Let o denote the origin. 

Definition 3.3. Let E be the set of all equilateral triangles T C C where the vertices of T are on 
the boundary of C. 

In our arguments, we will also make use of the (largest) inner circle c that is contained in all 
triangles in E. Equivalently, this circle is the intersection of all triangles in E: 

Definition 3.4. Let c = Dt^eT = Cd where d is defined as: (for an arbitrary T E E), d is the 
minimum distance from the boundary of T to the origin. 
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Our instance of the intermediate polygon problem will be an intersection of n triangles T each 
in the set E. The common intersection of these triangles will contain c, and next we prove that 
in fact any triangle (contained in C) that contains c must in fact be equilateral. This will help us 
reason about what sorts of triangles can make our instance a YES instance. The following two 
lemmas are proved in [1], but we include the proofs here for completeness. 

Lemma 3.5. [lj Any arbitrary triangle T with c C T C C must be in the set E. 

Proof: Consider a triangle T with c C T C C . Then let e\, e-i and e^ be the three edges of T and 
let 9i,02 and 0% be the viewing angle from the origin o, namely 0{ is the angle formed by (aj,o, 6j 
where a% and hi are the endpoints of e^. 

Since o G T, we have that #i+#2+#3 = 2-7T. Consider an edge e^. We will prove, by contradiction, 
that ej n c must contain exactly one point (i.e. ei must be tangent to the circle c). Suppose not - 
since c C T, we must have that n c =. Then let £ be the line parallel to ej that is tangent to c. 
Let e\ be the intersection of £ with C. The viewing angle 9[ of is strictly larger than 0j, yet the 
intersection of any line £ tangent to c with C has viewing angle exactly ^ and hence we conclude 
that 0\ + 02 + 03 < 2-7T, which is a contradiction. 

So each ej is tangent to c and in fact we can use a similar argument to conclude that each e{ 
must be exactly the intersection of a line I tangent to c with C (otherwise, again we would have 
that 6»i + 2 + 03 < 2vr). 

Hence, we conclude that each edge of T has the same length, and each endpoint is on the 
boundary of C so T € E. ■ 

Throughout the remainder of this section, consider any finite set T±,T2, ...T n £ E of equilateral 
triangles, and let 5 be the vertices of H™ =1 Tj. 

Lemma 3.6. [1] Let T be a triangle with S CT cC. Then T G {Ti,T 2 , ...T n }. 

Proof: Clearly we have that conv(S) C T since T is convex, and we also have that c C conv(S) = 
n" =1 Tj. So by Lemma 3.5, we can conclude that T must be in E. Suppose that T ^ {T\,T2, ---Tn}. 

Let {pi,P2,P3} = T n c (i.e. these are the three points on the boundary of T closest to the 
origin). Similarly, for each Tj let {p\,P2iP^\ = ?i H c. Then {pi,P2)P3} is a rotation (by < ^) of 
{Pi 1^3} an d hence {pi,P2,P3} are each strictly in the interior of Tj. 

Hence, {^1,^2,^3} are on the boundary of conv(S) (IT but not on the boundary of conv(S), so 
T cannot contain conv(S). ■ 

Lemma 3.7. For each edge ej of a triangle Ti, \ej n S\ = 2 and furthermore for each s £ S, s 
intersects the edges of exactly two (distinct) triangles in {T\, T2, ...T n }. 

Corollary 3.8. \S\ = 3n 

Proof: Each edge of conv(S) is by definition a subsegment of some unique edge ej of some triangle 
in {Ti,T2, ...Tn}. All we need to show is that to each edge ej (of some triangle in {T\,T2, ...T n }) 
we can find an edge of conv(S) which is a subsegment of ej: 

Let pj be the closest point on e,- to the origin. As we argued in Lemma 3.6, for all other 
triangles, pj is strictly in the interior. So the ray from the origin to pj hits the segment e,- first 
(out of all edges of all triangles in E). Hence pj is on the boundary of conv(S), but only one edge 
(namely ej) contains pj so the edge of conv(S) that contains pj is a subsegment of ej, as desired. 
■ 

As we noted, the gadget that we use here is a slight modification of the one in [1] - and the 
modification that we need involves rescaling: 
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Definition 3.9. For each triangle T £ E, define T^ 1 € ) as the scaling down of T such that the 
vertices of T( 1-e ) are on the boundary of Ci_ 6 . 

This rescaling is precisely what ensures that the original instance is a NO instance, but as we 
will see, if e is sufficiently small then every small subset of S is a YES instance. 

Definition 3.10. Let Si be the vertices of conv(S) n T^ 1 . 

Claim 3.11. If e is sufficiently small, then Si = S — Ti n S . 

Proof: Recall that conv(S) = n" =1 Tj. Consider an edge ej of Tj. Using Lemma 3.7, |&j-n5| = 2, and 
we can choose e small enough such that the region strictly between £ ^ (namely, the corresponding 
edge in T^ 1 and ej does not contain any points in S, in which case Si = S — Ti n S . ■ 

So consider the following instance of the intermediate polygon problem: 

• Let P = conv (vertices in e ') 

• and let S = vertices of C\f =1 Ti. 
Claim 3.12. (P, S) is a NO instance. 

Proof: P C Ci_ e by the definition of T^~ e \ and using Lemma 3.5, any triangle T contained in C 
that contains S must be in the set E; and since any triangle in E has its vertices on the boundary 
of C, we conclude that T is not contained in C\- e and hence (P, S) is indeed unsatisfiable. ■ 

Lemma 3.13. For any S' C S with \S'\ < n, (P, 5') is a YES instance. 

Proof: Using Lemma 3.7, each s £ S intersects exactly two edges of triangles in {T\,T2, ...T n }, so 
if 1 5" | < n, there must be a triangle Tj for which Tj n S' =. 

Consider T. (1_e) : Using Claim 3.11, we conclude that T. (1-e) n S' = S' - Ti n S' = S' . And we 
have that £ ^ C Ci_ e , so (P, 5") is indeed satisfiable. ■ 

We use the following lemma from Vavasis: 

Lemma 3.14. [17] Let rank(M) = r, and let M = UV where U and V have r columns and rows 
respectively. Then M has rank + {M) = r if and only if there is an invertible r x r matrix Q such 
that UQ~ l and QV are both nonnegative. 

We could use the reduction in [17] from nonnegative rank to the intermediate simplex problem, 
but there is a technical issue that arises. Here, we give a slight modification of this reduction that 
avoids this issue: 

Consider the plane F = {(x, y, z)\x + y + z = 1}. Map P to this plane so that P is contained 
in the nonnegative orthant (scale down P, if need be), and let the nonnegative hull of vectors in P 
and the origin be denoted by the cone S. 

Let C = > 0} and set the rows of U to be vertices of F n S and let V = A T . Note that 

the vertices of F n C are just the three-dimensional coordinates corresponding to the points in S. 
Note that UV is a nonnegative matrix, since each vertex of F n C is contained in the cone S. This 
reduction is essentially the one in [17] but with a minor change to avoid a certain technical issue 
that would arise otherwise. 

Lemma 3.15. There is an invertible r x r matrix Q such that UQ -1 and QV are both nonnegative 
if and only if (P, S) is a YES instance. 
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Proof: Suppose (P, S) is a YES instance. Let the rows of Q be the three-dimensional coordinates 
of the vertices of the triangle T (i.e. these are the vectors on the plane F). These points are in the 
cone C, so QV is nonnegative. Furthermore, S C T so each row of U is in the convex hull of rows 
of Q and UQ~ l is nonnegative. 

Conversely, consider an invertible Q for which f/Q -1 and QV are both nonnegative. For each 
row in Q, let pi be the intersection of the ray through the origin and the row in Q with F. pi 6 C, 
so the associated two-dimensional point is in P. Furthermore, each row of U is in the nonnegative 
hull of {pi,P2,P3} and each pi and each row in U has nonnegative entries and the sum of the entries 
is one. Hence each pi and each row in U has unit i\ norm. So each row of U is in the convex hull 
of {pi,P2iP3j) and so the associated two-dimensional triangle contains S. ■ 

Note that in this reduction, rows of M = UV are mapped one-to-one to points in S and columns 
of M are mapped one-to-one to facets in P. Hence, (U, V) is a NO instance, but any set of < n 
rows of U is a YES instance. 

So M = UV is a nonnegative matrix of dimension 3n x 3n with nonnegative rank > 4 and yet 
any submatrix of < n rows has nonnegative rank < 3. We can use this matrix M to construct a 
3rn x 3rn matrix which is block diagonal, and has M along the diagonal. Then: 

Theorem. For any r £ N, there is a 3rn x 3rn nonnegative matrix which has nonnegative rank at 
least 4r and yet for any < n rows, the corresponding submatrix has nonnegative rank at most 3r. 

An interesting open question is to characterize the family of matrices for which nonnegative rank 
can be certified by a small submatrix, since in many applications is is quite natural to assume that 
the input matrices satisfy these conditions. 
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